{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Benchmarking with sktime\n",
    "\n",
    "The benchmarking modules allows you to easily orchestrate benchmarking experiments in which you want to compare the performance of one or more algorithms over one or more data sets. It also provides a number of statistical tests to check if observed performance differences are statistically significant.\n",
    "\n",
    "The benchmarking modules is based on [mlaut](https://github.com/alan-turing-institute/mlaut).\n",
    "\n",
    "## Preliminaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:56.286270Z",
     "iopub.status.busy": "2020-12-19T14:29:56.285730Z",
     "iopub.status.idle": "2020-12-19T14:29:57.240898Z",
     "shell.execute_reply": "2020-12-19T14:29:57.241403Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# import required functions and classes\n",
    "import os\n",
    "\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "from sktime.benchmarking.data import UEADataset, make_datasets\n",
    "from sktime.benchmarking.evaluation import Evaluator\n",
    "from sktime.benchmarking.metrics import PairwiseMetric\n",
    "from sktime.benchmarking.orchestration import Orchestrator\n",
    "from sktime.benchmarking.results import HDDResults\n",
    "from sktime.benchmarking.strategies import TSCStrategy\n",
    "from sktime.benchmarking.tasks import TSCTask\n",
    "from sktime.classification.frequency_based import RandomIntervalSpectralForest\n",
    "from sktime.classification.interval_based import TimeSeriesForest\n",
    "from sktime.series_as_features.model_selection import PresplitFilesCV"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up paths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.245344Z",
     "iopub.status.busy": "2020-12-19T14:29:57.244681Z",
     "iopub.status.idle": "2020-12-19T14:29:57.246812Z",
     "shell.execute_reply": "2020-12-19T14:29:57.247148Z"
    }
   },
   "outputs": [],
   "source": [
    "# set up paths to data and results folder\n",
    "import sktime\n",
    "\n",
    "DATA_PATH = os.path.join(os.path.dirname(sktime.__file__), \"datasets/data\")\n",
    "RESULTS_PATH = \"results\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create pointers to datasets on hard drive\n",
    "Here we use the `UEADataset` which follows the [UEA/UCR format](http://www.timeseriesclassification.com) and some of the time series classification datasets included in sktime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.250551Z",
     "iopub.status.busy": "2020-12-19T14:29:57.250073Z",
     "iopub.status.idle": "2020-12-19T14:29:57.252024Z",
     "shell.execute_reply": "2020-12-19T14:29:57.252506Z"
    }
   },
   "outputs": [],
   "source": [
    "# Create individual pointers to dataset on the disk\n",
    "datasets = [\n",
    "    UEADataset(path=DATA_PATH, name=\"ArrowHead\"),\n",
    "    UEADataset(path=DATA_PATH, name=\"ItalyPowerDemand\"),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.256056Z",
     "iopub.status.busy": "2020-12-19T14:29:57.255556Z",
     "iopub.status.idle": "2020-12-19T14:29:57.257306Z",
     "shell.execute_reply": "2020-12-19T14:29:57.257793Z"
    }
   },
   "outputs": [],
   "source": [
    "# Alternatively, we can use a helper function to create them automatically\n",
    "datasets = make_datasets(\n",
    "    path=DATA_PATH, dataset_cls=UEADataset, names=[\"ArrowHead\", \"ItalyPowerDemand\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### For each dataset, we also need to specify a learning task\n",
    "The learning task encapsulate all the information and instructions that define the problem we're trying to solve. In our case, we're trying to solve classification tasks and the key information we need is the name of the target variable in the data set that we're trying to predict. Here all tasks are the same because the target variable has the same name in all data sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.261038Z",
     "iopub.status.busy": "2020-12-19T14:29:57.260523Z",
     "iopub.status.idle": "2020-12-19T14:29:57.262423Z",
     "shell.execute_reply": "2020-12-19T14:29:57.262902Z"
    }
   },
   "outputs": [],
   "source": [
    "tasks = [TSCTask(target=\"target\") for _ in range(len(datasets))]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Specify learning strategies\n",
    "Having set up the data sets and corresponding learning tasks, we need to define the algorithms we want to evaluate and compare."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.266097Z",
     "iopub.status.busy": "2020-12-19T14:29:57.265617Z",
     "iopub.status.idle": "2020-12-19T14:29:57.267246Z",
     "shell.execute_reply": "2020-12-19T14:29:57.267737Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Specify learning strategies\n",
    "strategies = [\n",
    "    TSCStrategy(TimeSeriesForest(n_estimators=10), name=\"tsf\"),\n",
    "    TSCStrategy(RandomIntervalSpectralForest(n_estimators=10), name=\"rise\"),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up a results object\n",
    "The results object encapsulates where and how benchmarking results are stored, here we choose to output them to the hard drive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.270574Z",
     "iopub.status.busy": "2020-12-19T14:29:57.270022Z",
     "iopub.status.idle": "2020-12-19T14:29:57.274904Z",
     "shell.execute_reply": "2020-12-19T14:29:57.275397Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/mloning/Documents/Research/software/sktime/sktime/sktime/benchmarking/base.py:153: UserWarning: Existing results file found in given path: results. Results file will be updated\n",
      "  warn(f\"Existing results file found in given path: {path}. \"\n"
     ]
    }
   ],
   "source": [
    "# Specify results object which manages the output of the benchmarking\n",
    "results = HDDResults(path=RESULTS_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run benchmarking\n",
    "Finally, we pass all specifications to the orchestrator. The orchestrator will automatically train and evaluate all algorithms on all data sets and write out the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:29:57.280362Z",
     "iopub.status.busy": "2020-12-19T14:29:57.278273Z",
     "iopub.status.idle": "2020-12-19T14:30:04.878619Z",
     "shell.execute_reply": "2020-12-19T14:30:04.878959Z"
    }
   },
   "outputs": [],
   "source": [
    "# run orchestrator\n",
    "orchestrator = Orchestrator(\n",
    "    datasets=datasets,\n",
    "    tasks=tasks,\n",
    "    strategies=strategies,\n",
    "    cv=PresplitFilesCV(),\n",
    "    results=results,\n",
    ")\n",
    "orchestrator.fit_predict(save_fitted_strategies=False, overwrite_predictions=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate and compare results\n",
    "Having run the orchestrator, we can evaluate and compare the prediction strategies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:04.891964Z",
     "iopub.status.busy": "2020-12-19T14:30:04.891378Z",
     "iopub.status.idle": "2020-12-19T14:30:05.132709Z",
     "shell.execute_reply": "2020-12-19T14:30:05.133194Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>strategy</th>\n",
       "      <th>accuracy_mean</th>\n",
       "      <th>accuracy_stderr</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rise</td>\n",
       "      <td>0.818134</td>\n",
       "      <td>0.020871</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>tsf</td>\n",
       "      <td>0.843907</td>\n",
       "      <td>0.019889</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  strategy  accuracy_mean  accuracy_stderr\n",
       "0     rise       0.818134         0.020871\n",
       "1      tsf       0.843907         0.019889"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evaluator = Evaluator(results)\n",
    "metric = PairwiseMetric(func=accuracy_score, name=\"accuracy\")\n",
    "metrics_by_strategy = evaluator.evaluate(metric=metric)\n",
    "metrics_by_strategy.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "The evaluator offers a number of additional methods for evaluating and comparing strategies, including statistical hypothesis tests and visualisation tools, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:05.138627Z",
     "iopub.status.busy": "2020-12-19T14:30:05.138151Z",
     "iopub.status.idle": "2020-12-19T14:30:05.147096Z",
     "shell.execute_reply": "2020-12-19T14:30:05.147584Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>strategy</th>\n",
       "      <th>accuracy_mean_rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rise</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>tsf</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  strategy  accuracy_mean_rank\n",
       "0     rise                 2.0\n",
       "1      tsf                 1.0"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evaluator.rank()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently, the following functions are implemented:\n",
    "\n",
    "* `evaluator.plot_boxplots()`\n",
    "* `evaluator.ranks()`\n",
    "* `evaluator.t_test()`\n",
    "* `evaluator.sign_test()`\n",
    "* `evaluator.ranksum_test()`\n",
    "* `evaluator.t_test_with_bonferroni_correction()`\n",
    "* `evaluator.wilcoxon_test()`\n",
    "* `evaluator.friedman_test()`\n",
    "* `evaluator.nemenyi()`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
